Optical Character Recognition from Degraded Document Images
نویسنده
چکیده
Segmentation of the text from badly degraded document images is very challenging tasks due to the high inter/intra variation between the document background and the foreground text of different types of document images. In this paper, a novel document image binarization technique is used to addresses the issues in the degraded document images by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variations caused by different types of document degradations. The adaptive contrast map is first constructed for an input degraded document image. Then the contrast map is then binarized and combined with the Canny’s edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. Then apply the vertical scanning to find how many lines in the binary document image. After then apply the horizontal scanning to find how many characters in the image. Then the character is recognized using discrete wavelet transform.
منابع مشابه
Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition
Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document i...
متن کاملExtraction of Original Text Document from a Set of Degraded Text Documents from the Same Source
Information extraction is the task of extracting structured data from a degraded document. It includes data extraction such as text, image or graphics from the sources such as an image, video or documents. Text detection and extraction from the degraded document finds application in wide range of study. In this paper, an Optical Character Recognition less (OCR-less) method of obtaining an origi...
متن کاملA Quad Tree Based Binarization Approach to Improve quality of Degraded Document Images
This paper proposes a novel binarization algorithm for converting the grayscale and color images into black and white images. The binarization is one of the very important process in all the researches pertaining to the field of the Document image processing and Pattern recognition. Since quality of binary image plays a critical role in the further processing of the document, especially in the ...
متن کاملBinarization of Document Image
Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...
متن کاملThreshold Approach to Handwriting Extraction in Degraded Historical Document Images
Handwriting extraction is the skill of a system to get and translate comprehensible hand written input via sources such as document, photos, tough screen and other devices. The picture of the written document is used to detect written text by the use of optical scanning i.e. known as optical character recognition. Handwriting extraction basically uses optical character recognition. Conversely, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014